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PREFACE 


The work described in this report was performed by the Data Systems 
Division of the Jet Propulsion Laboratory. 

Correspondence should be directed to either R. J. Hanson, Dept, of 
Computer Science, Johnson Hall, Washington State University, Pullman, 
Washington, 99163, or F. T. Krogh and C. L. Lawson, Jet Propulsion Lab- 
oratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, 
California, 91103. 
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ABSTRACT 


This memorandum proposes a set of Fortran 
callable subprograms which will be useful in the develop- 
ment of efficient portable ANSI Fortran subprograms and 
applications programs in the area of linear algebra. 
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Introduction 


The purpose of this report is to propose a set of standard subprograms 
(modules) for performing many of the elementary operations of numerical 
linear algebra. The goal is to make it more feasible to produce efficient 
portable Fortran programs in the area of linear algebra. 

By adopting a set of standard subprogram names and parameter lists for 
certain fundamental operations it becomes worthwhile to make the effort of 
producing efficient assembly coded implementations of these subprograms for a 
wide variety of different computer systems. For example, it has been found 
[Krogh (1)] that the use of assembly coded modules in a double precision 
program for solving linear equations based on Householder transformations 
with column scaling and column interchanges reduced the execution time on a 
Univac 1108 by 15% to 30% relative to the time required when carefully written 
Fortran modules were used. 

We intend this report as a specific proposal about which discussion of 
these ideas can be focused. We hope that readers of this report will com- 
municate to us their thoughts on the general usefulness of such a set of standard 
subprograms as well as comments on specific details. 

We separate this proposed set of subprograms into two classes to 
emphasize the different levels of importance which we attach to the two classes. 

Class I contains subprograms implementing the operations that occur as 
the most frequently executed innermost loops in many of the fundamental 
algorithms of linear algebra. For this reason increases in efficiency of 
executing these operations may be expected to be particularly significant in 
improving the efficiency of the programs in which they are used. 

We feel that it would represent a significant advance in the technology of 
supporting efficient portable Fortran programs if carefully programmed 
assembly coded subprograms were available on a wide variety of computer 
systems for these Class I functions and if persons writing portable Fortran 
subprograms for the fundamental algorithms of linear algebra would use these 
Class I functions where applicable. 

The operations which we feel belong in Class I according to the above 
stated criteria are: (1) the dot product (inner product) of two vectors, (2) the 
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elementary vector operation, y := ax + y where x and y are n-vectors and a 
is a scalar, and (3) the Givens 2x2 orthogonal transformation applied to a 
2 x n submatrix. 

In the case of the Givens transformation there are presently two com- 
petitive forms in which it can be implemented, the standard (4-multiply, 2- 
add) form and a modified (2-multiply, 2- add, no square root) form. For the 
sake of discussion we give proposed subprogram specifications for both of these 
forms . 

Note that we have omitted from Class I versions of the dot product which 
require arithmetic precision extended beyond that of standard double -precision 
or the notion of double-precision complex arithmetic since these concepts are 
not at present defined in ANSI FORTRAN. There is a chance that when the 
revised Fortran standard [FORTREV] is finalized by ANSI it will define a 
double-precision complex variable type. We propose that the specification of 
subprograms using double-precision complex arithmetic should await finaliza- 
tion of FORTREV. N 

The subprograms in Class II represent operations which occur frequently 
in computational linear algebra but not in the most frequently executed loops. 
Thus their efficiency may have less impact than Class I subprograms. In fact 
however it is not easy to predict the relative importance of different sub- 
programs in a particular application and actual timings may show Class II 
subprograms to be of more importance than one might expect. For instance in 
the tests mentioned previously, [ Krogh (1)], about 1 /3 of the saving at N = 100 
and about 1/2 at N = 15 were due to the assembly coded module for the 
euclidean norm while the rest of the savings were due to the assembly coded 
modules for the inner product and the elementary vector operation. 

Aside from the question of efficiency the Class II subprograms are 
proposed for their potential convenience in the writing of programs for linear 
algebraic problems. The use of these subprograms may improve the self- 
documenting quality of the Fortran programs in which they are used and may 
tend to reduce the occurrence of coding errors by relieving the user of one 
level of coding detail. 
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For the purpose of experimenting with this approach to writing linear 
algebra programs we have produced Fortran coded and Univac 1108 assembly 
language coded subprograms for most of the Class I modules and Fortran coded 
subprograms for most of the Class II modules. 

After allowing a period of time for discussion of these ideas it is our plan 
to bring the following set of subprograms into being: 

(1) Portable Fortran coded subprograms for all of the modules except 
for the extended precision inner-product module. 

(2) Assembly coded subprograms for Class I modules for as many 
different machines as possible. 

We invite persons willing to undertake the assembly coding of these 
Class I modules on machines other than the Univac 1108 and the IBM 360/75 to 
do so and communicate the resulting code to us. We will depend upon such 
contributions to achieve wide machine coverage with this set of subprograms. 

Persons willing to write assembly versions of these subprograms for 
different machines may obtain listings of Fortran versions and test drivers, 
from the authors. 

Convention Regarding Vector Storage 

An N- vector x = (x^, . . . , x^) to be referenced by any subprogram 
described in this memorandum will be identified in the subprogram parameter 
list by a pair of symbols, say X and INCX, where X is the name of an array of 
type REAL, DOUBLE PRECISION, or COMPLEX, as appropriate, and INCX is 
an INTEGER type variable indicating the index increment between successive 
components of x in the array X. The location of components of the vector x 
within the array X is specified as follows: 

If INCX >0 X(1 + (I - 1)*INCX) = Xj I = 1, .... N 

If INCX <0 X(1 4 (N - I)* | INCX | ) = Xj 1=1 N 
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For example if N = 4 and INCX = 2 we have X(l) = X j, X(3) = x 
X(5) = x 3 , and X(7) = while if N = 4 and INCX = -2 we have X(7) = Xj, 
X(5) = x^, X(3) = x^, andX(l) = x^. 

In some of the subprograms, as noted below, INCX is required to be 
nonnegative. 

Convention Regarding N < 0 

FUNCTION subprograms return the value zero when N < 0. Exceptions 
are those FUNCTIONS which return an index. These return the index, 1, if 
N < 0. 
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Proposed Subprograms in Class I 


Operation 


Usage Statement 


Remarks 


N 

w := Z x.y. SW = SDOT(N, SX, INCX, SY, INCY) 

j-i 11 


SW.SDOT, SX( ), SY( ), single precision 


N 

w := Z x.y. DW = DSDOT(N, SX, INCX, SY, INCY) 


DW, DSDOT double precision 
SX( ),SY{ ) single precision 


N 

w := Z x.y. DW = DDOT(N, DX, INCX, DY, INC Y) 


DW, DDOT, DX{ ), DY( ) double precision 


N 

W := z x.y, CW = CDOT<N, CX, INCX, CY. INCY, 0) 

1 11 


CW.CDOT, CX( ), CY( ) complex 


N _ 

w := £ x.y. CW = CDOT(N, CX, INCX, CY, INCY, 1) 

i=l 1 1 


CW.CDOT, CX< ), CY{ ) complex 


y := axfy 
y := axfy 
y := axfy 


CALL SELVOP(N, SA, SX, INCX.SY, INCY) 
CALL D£LVOP(N, DA, DX, INCX, DY, INCY) 
CALL CELVOP(N, CA. CX, INCX, CY, INCY) 


SA,SX( ),SY( ) single precision. 
DA,DX( ), DY( ) double precision. 
CA,CX( ), CY( ) complex. 


Apply Givens CALL SG2(N, SX, INCX. S Y, INCY, C, S) 

reflection matrix 

[s -3 to the 

2 x n matrix 



SX{ ),SY( ), C, and 5 single precision 
See SGI in Class II for computation of 
C and S. 


Apply Givens CALL DG2(N, DX, INCX, DY, INCY, DC, DS) 

reflection matrix 

[c s"lto the 
[s -cj 

2 x n matrix 



DX( ), DY( ), DC and D5 are double precisim. 
See DGl in Class II for computation of 
DC and DS. 


Apply the modified CALL SMG2(N, SX, INCX, SY, INCY) [uses COMMON/ 
Givens reflection SMG3/1FLAG 1 , IFLAG2 , T1 , T2, SRA, SRB] 
transformation [ See 
Appendix A ] to the 
2 x n matrix 


SX( ), SY( ) Tl, T2, SRA, and SRB 
single precision. See SMG1 in Class 
II for computation of the quantities in 
COMMON. 


Apply the modified CALL DMG2(N, DX, INCX, DY, INCY) [uses COMMON/ 
Givens reflection DMG3/IFLAG1 , IFLAG2, DTI , DT2, DRA, DRB ] 
transformation [See 
Appendix A] to the 
2 x n matrix 


DX( ), DY( ), DTI, DT2, DRA, DRB 
double precision. See DMG1 in Class 
II for computation of the quantities in 
COMMON, 
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Proposed Subprograms in Class II 


Operation 


Usage Statement 


Remarks 


Compute ,, CALL SG1(A, B, C r S) A, B, C, and S single precision 

r := (a 2 +b 2 ) 1/Z 
c := a/r, s ; = b/r 
a := r 

defining the Givens 
reflection matrix 



Compute CALL DG1(DA, DB, DC, DS) DA, DB, DC, and D5 double precision, 

r : = (a^b 2 ) 1 ' 
c := a/r, s := b/r 
a := r 

defining the Givens 
reflection matrix 



Compute parameters CALL SMG 1 (SCA, SCB, A, B) fuses COMMON/ 
defining a modified SMG3 /IFLAG1 , 1FLAG2, T1 , T2, SRA, SRB 1 
Givens reflection. 

See Appendix A. 


SCA, SCB, A, B, T 1 , T2, SRA, SRB single 
precision; IFLAG 1 , 1FLAG2 integer. 


Compute parameters CALL DMG1 {DSCA, DSCB, DA, DB) [uses COMMON/ 
defining a modified DMG3/IFLAG1, IFLAG2, DTI, DT2, DRA, DRD] 
Givens reflection. 

See Appendix A. 


DSCA, DSCB, DA, DB, DTI, DT2, DRA. 
DRB double precision; IFLAG1, 
IFLAG2 integer. 


N 

w := b + Z x.y . DW = DXDOT(N, DX, INCX, DY, INC Y, DB, XC, 0) 

i=l 1 1 


N 

w := b+ Z x.y.+c DW = DXDOT(N, DX, INCX, DY, INCY, DB, XC, 1) 

i=l 1 1 


DW , DXDOT, DX( ), DY( ), DB, double 
precision; XC extended precision; XC 
is represented with a single precision 
array of length 5; XC is replaced by 
N 


DW, DXDOT, DX( ), DY( ), DB double 
precision; XC extended precision; XC 
N 

is replaced by b + Z x.y^+c. 


Copy x into y 

CALL 

Copy x into y 

CALL 

Copy x into y 

CALL 

Interchange x and y 

CALL 

Interchange x and y 

CALL 

Interchange x and y 

CALL 


SCOP Y(N, SX, INCX, SY, INCY) 
DCOPY<N, DX, INCX, DY, INCY) 
CCOP Y(N, CX, INCX, CY, INCY) 
SSW AP(N, SX, INCX, SY, INCY) 
DSWAP(N, DX, INCX, DY, INCY) 
CSWAP{N, CX, INCX, CY, INCY) 


SX{ ),SY{ ) single precsion 
DX( ), DY( ) double precision 
CX( ), CY( ) complex 
SX( ),SY( ) single precision 
DX( ), DY( ) double precision 
CX( ), CY( ) complex 


In all of the remaining subprograms INCX 2 0 is required. 


w 


w 



SW = S2NRM(N,SX, INCX) 


DW = D2NRM{N, DX.INCX) 


SW,S2NRM,SX{ ) 9ingle precision 


DW, D2NRM, DX( ) double precision 
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Proposed Subprograms in Class II (contd) 


Usage Statement 
SW = SC2NRM(N, CX.INCX) 


SW = SD2NRM(N, SX.INCX) 


DW = DX2NRM(N, DX.INCX) 


SW = SASUM{N, SX, INCX) 


DW = DASUM(N, DX.INCX) 


N 

w : = 2 { lReal(x.)l + llmag(x.)U SW = SCASUM(N, CX, INCX) 
i=l 1 


x := ax 
x := ax 
x ;= ax 
x : = ax 

Find indices i and 
mm 

J rnax corr esponaing 
to the smallest and 
largest components of x. 

Find indices i . and CALL DMNMX{N, DX, INCX, 1MIN, IMAX) 
> m in. 

1 corresponding 
max 6 

to the smallest and 
largest components of x. 

Find an index i I MAX = ISAMAX{N, SX, INCX) 

corresponding 
to the maximum abso- 
lute value of components 
of the vector x. 

Find an index i 1MAX = IDAMAX(N, DX, INCX) 

max 

corresponding to the 
maximum absolute value 
of components of the 
vector x. 

Find an index i max IMAX = ICAMAX(N, CX.INCX) 

identifying the compon- 
ent of the complex vector 
x having maximum sum 
of magnitudes of real and 
imaginary parts. 


CALL SSCALE{N,SA, SX.INCX) 

CALL DSCALE(N, DA, DX.INCX) 

CALL CSCALE(N,CA, CX, INCX) 

CALL CSSCAL(N, SA, CX.INCX) 

CALL SMNMX(N, SX, INCX, IMIN, IMAX) 



Remarks 

SW, SC2NRM, single precision; CX{ ) 
complex. 


SW, SD2NRM, SX( ) single precision. 
Double precision arithmetic used 
internally. 


DW, DX2NRM, DX( ) double precision. 
Extended precision arithmetic used 
internally. 


SW , SASUM, SX( ) single precision 


DW, DASUM, DX( ) double precision 

SW, SCASUM single precision; CX( ) 
complex. 

SA, SX( ) single precision 
DA, DX( ) double precision 
CA , CX{ ) complex 
SA single precision; CX( ) complex. 
SX( ) single precision 

DX( ) double precision 


SX( ) single precision 


DX( ) double precision 


CX ( ) complex 
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APPENDIX A 


A MODIFIED GIVENS TRANSFORMATION 


Introduction 


The standard Givens reflection transformation is a 2 x 2 matrix 


G = 


s are 


' ^J. For an arbitrary 2 vector x = the entries c and 

z 21 /2 r "1 

computed by r = (a + b ) ' , c = a/r and s = b/r, and this gives Gx - I q]« 


The modified Givens transformation [ Gentleman (2) and (3)] is a varia- 
tion of the standard Givens transformation. This is done by considering the 
vector x written in factored form x = D^x^ • Here D^ is a 2 x 2 diagonal 


matrix. The matrix G is constructed so that Gx = J^J. The identity GD^ = 
D^H is the critical point. (Here the matrix D^ is 2 x 2 and diagonal.) The 
2x2 matrix H has one of the two forms. 


H = 




or H 




With the standard Givens transformation formation of the product matrix 

GA ? requires 4n multiplications and 2n additions. With the modified 
Cd x n 

Givens transformation the matrix A is always considered in factored form 
A = DjAj. Noting that GA = G(D^Aj) = D^HA^), allows the matrix A^ = HA^ 
to be computed with 2n multiplications and 2n additions because of the way the 
matrix H is defined above. 


No Square Roots 

Only squares of elements of the 2x2 diagonal matrix D J are involved in 
the computation of the matrix H and (the squares of) the elements of the updated 
2x2 diagonal matrix D^. This remark obviates the need for computing 
square roots. 
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APPENDIX A (contd) 


Rescaling 

The elements of the diagonal matrices generated will generally decrease 
as further transformations are constructed. This may require occasional 

_ j 

rescaling to avoid underflow danger. The identity D^A^ = (D^S )(SA^) = D^Aj 
(S an arbitrary 2x2 diagonal matrix) shows that the factorization can be 
rescaled whenever desired. 

We have provided for rescaling to be performed whenever either diagonal 

- 1 2 

term of the matrix is less than 2 

The matrix can be initialized to be the 2x2 identity matrix, for 
example . 

Coding Details 

The modified Givens transformation is called with the Fortran statement 


CALL SMG1 (SC A, SCB, A, B ) 

The standard Givens transformation which is implicitly constructed in this 
subroutine eliminates the second entry of the 2 -vector 


sca 1 / 2 *a 

scb 1//2 *b 


-A" 

ix H r 

Lb J 


eplaces A in storage. The squares 


The first entry of the product matrix 
of the entries of the diagonal matrix replace SCA and SCB in storage. 

‘A 


When rescaling occurs both the first entry of H 
terms SCA and SCB are appropriately modified. 


B 


and the diagonal 


The transformation is applied to a 2 x n matrix D 
Fortran statement. 


x. ... ,x 
1 n 


with the 


CALL SMG2(N, X, INCX, Y, INCY). 
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APPENDIX A (contd) 


The various parameters necessary to apply this transformation are 
communicated from SMG1 to SMG2 by means of the COMMON statement 

COMMON /SMG3/IFJLAG1, IFLAG2, Tl, T2, SRA, SRB 

These parameters are defined as follows: 

IFLAGl Indicates the form of the matrix H: 

IFLAGl = 0 when H = 



IFLAGl = 1 when H = 

IFLAG2 Rescaling indicator 

IFLAG2 = 0 if no rescaling is required 
IFLAG2 = 1 when rescaling is required 

Tl, T2 Nonunit entries of the appropriate form of the 2x2 

matrix H. 



SRA, SRB 


Diagonal entries of the diagonal matrix S which will be 


applied to the matrix H 



when IFLAG2 = 1 . 


The values of SRA and SRB, when IFLAG2 = 1, are 

„24 „12 . 

either 2 ,2 , or 1 . 
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APPENDIX B 


SOME IMPLEMENTATIONS OF DSDOT 

To provide concrete examples of our present thinking on implementations 
of these subprograms we include listings of three versions of DSDOT. Figure B-l 
is Fortran code, Figure B-2 is Univac 1108 assembly code and Figure B-3 is 
IBM 360/75 assembly code. 

In the Fortran code note the use of DBLE to achieve double length 
accumulation. With some Fortran compilers double length accumulation would 
occur even without the DBLE just because DSDOT is of type DOUBLE 
PRECISION. The DBLE is necessary however on some compilers. This 
leads to unnecessary inefficiency on some compilers which would not need the 
DBLE. This is an example of the present impossibility of writing portable 
Fortran code which will be efficient on a variety of machines. 

Another problem arises in handling indexing with the index spacing 
specified by the parameters INCX and INCY. The statement at line 19 gen- 
erates quite efficient machine code on the Univac 1108 but is not within the 
specifications of ANS Fortran. The alternative code given as comments in 
lines 21-23 is legal ANS Fortran but generates much less efficient code on the 
Univac 1108, 

The comment cards "C1 M , M C2 M , and "C3" delimit the two different 
versions of Fortran code. The comment cards "C4 n and M C5" delimit the 
initial descriptive comments. We are using these comment cards "C1 M through 
"C5" as reference marks for performing certain editing operations on the 
whole set of subprograms. 
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1 


DOUBLE PRECISION FUNCTION DSD0T< H .X. INCX. Y . INCY > 



2 

C4 




3 

C 

DOT PRODUCT OF SNGL. PREC. VECTORS USING DELE. PREC. 

ACCUMULATION. 

4 

C 




S 

C 

COMPUTES DSDOT - SUM FROM 1 TO N OF Ad ) *Bd ) WHERE 



6 

C 

AC I > - X< 1-INCX+I*INCX) IF INCX .GE. 0 



7 

c 

AC K > - X< 1-N*INCX4-I*INCX) IF INCX .LT. 0 



6 

c 

B ( I ) DEFINED SIMILARLY WITH X AND INCX REPLACED BY 

Y 

AND INCY 

9 

cs 




to 


REAL XC 1 > . Y< 1 J 



11 


DSDOT - 0 .DO 



12 


I P( N .LE. 0 > RETURN 



13 


IF( INCX .LT. 0) GO TO 2 



14 


EX - -INCX + 1 



IS 


GO TO 4 



10 


2 EX - -N* INCX ♦ 1 



1? 


4 IF (INCY .LT. 0) CO TO I 



18 


Xt - -INCY + 1 



19 


GO TO 8 



20 


6 KY - -N* INCY *■ 1 



21 


• CONTINUE 



22 


DO 10 I ■ 1,N 



23 

Cl 




24 


DSDOT - DSDOT + DBLE(X( I* INCX 4- KX) )*DBLE( Y( l* INCY 

+ KYI) 

25 

C2 




26 

c 

JX - !• INCX * EX 



27 

c 

JY - I • INCY ♦ EY 



28 

c 

DSDOT - DSDOT 4- DBLECXf JX) >-DBLE(Y(JY> ) 



29 

C3 




30 


10 CONTINUE 



31 


RETURN 



32 


END 




Figure B-l. Fortran Code for DSDOT 

) 


1 


AXRf 



2 

»(1). 




4 

INNER PRODUCT OF SNGL. PREC. 

VECTORS USING DBLE. PREC. ACCUMULATION. 

5 

6 

. TO BE 

USED AS FORTRAN FUNCTION 

DSDOTtN.X. INCX. Y. INCY) 

7 

. WHERE 

DSDOT IS 

OF TYPE DOUBLE 

PRECISION. X AND Y ARE OF TYPE REAL. 

8 

. AND 

DSDOT- SUM FROM 1-1 TO N OF Ad)*B(l) WHERE 

9 

. Ad) 

= X(i-1NCX+!*INCX) IF 

INCX.GE.O 

10 

. A ( I ) 

- X( 1 -N* INCX+I* INCX) IF 

1 NCX ; LT . 0 

1 1 

. B( I ) 

DEFINED 

SIMILARLY. WITH 

X AND INCX REPLACED BY Y AND INCY 

13 

DSDOT* 

DSL 

A0. 72 

. STORE 0 IN A0 AND A1 

14 


LR 

R3 , *0 ,X1 1 

. STORE N IN R3 

1 5 


JCD 

R3.NPOS 

. STORE N-1 IN R3 AND TEST N 

16 


J 

6. XII 

. EXIT IF N.LE.O 

17 

NPOS 

DS 

A6 . SAVE 

. SAVE REGISTERS A6 AND A7 

18 


LA.U 

A2.*1 .XI 1 

. LOAD ADDRESS OP X 

19 


LA.U 

A3.*3.X11 

. LOAD ADDRESS OF Y 

20 


LX I 

At. *2. XII 

. LOAD INCREMENT ON X 

21 


LX I 

A3. *4. XII 

. LOAD INCREMENT ON Y 

22 


JP 

A2. TINCY 

. TEST IF INCX.GE.O 

23 


LNA 

A4.A2 

. ADD -INCXoCN-1 ) 

24 


SSA 

A4.16 

TO THE BASE 

25 


MS I 

A4 ,R3 

ADDRESS 

20 


AH 

A2 . A4 

FOR X 

27 

TINCY 

JP 

A3. LOOP 

. TEST ir INCY.GE.O 

26 


LNA 

A4 .A3 

. ADD -INCY* (N-T ) 

29 


SSA 

A4. 18 

TO THE BASE 

30 


MSI 

A4.R3 

ADDRESS 

31 


AH 

A3.A4 

FOR Y 

32 




BEGIN LOOP TO FORM INNER PRODUCT 

33 

LOOP 

FEL 

A4 . 0 . *A2 . LOAD X. CONVERT TO DOUBLE. AND INC. INDEX 

34 


FEL 

A8 . 0 . *A3 . LOAD Y, CONVERT TO DOUBLE. AND INC. INDEX 

35 


DFM 

A4 . A6 

. MULTIPLY X TIMES Y 

36 


DFA 

A0.A4 

. ACCUMULATE INNER PRODUCT 

37 


JGD 

R3 , LOOP 

. END OF INNER PRODUCT LOOP 

36 


DL 

A6 , SAVE 

. RESTORE REGISTERS A6 AND A7 

39 


J 

6. XII 

. RETURN FOR N.CT.O 

40 





41 

«(0) 




42 

SAVE 

4- 

0D 

. PLACE TO SAVE A6 AND A7 

1 43 


END . 




Figure B-2. UNIVAC 1108 Assembly Code for DSDOT 
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1 

2 - DOUBLE PRECISION FUNCTION DSDOT ( N . X . I NCX . Y . I NCY ) 

3 • 

4 - COMPUTED AS SUM FROM I -I TO N OF A( I )-»( I ) 

5 - WHERF. X( ) AND Y< ) ARE OF TYPE REAL WITH 

6 * 

7 ■ X(1 - INCX* I • INCX) = AC l> IF 1NCX .CT. 0 

8 * 

9 . X( 1 ~ N* I NCX-*- 1 ■ I NCX ) = A(l> IF 1NCX .LE. 0 

10 • 

11 • SIMILAR DEFINITIONS FOR Y< ) AND BCD 

1 2 * 

13 . THE SUM IS ACCUMULATED IN DOUBLE PRECISION 

1 4 

13 . WRITTEN BY R. J. HANSUlS , 27 JULY 1973 

16 

17 . 


18 

DSDOT 

CSECT 



19 


US INC 

•.15 

BASE RECI5TER ASSIGNMENT 

20 


SAVE 

(1.9)..* 

SAVE USED REGISTER CONTENTS 

21 


LM 

2.6 .0(1 > 

GET ALL POINTERS TO ARCS. IN REGS. 

22 


SDR 

0.0 

SET DSDOT = 0 

23 


L 

7.0(2) 

GET N AND 

24 


LTR 

7.7 

(SET CONDITION CODES) AND SEE 

2 3 


BNP 

DONE 

IF ITS .LE. 0. DEFINE DSDOT - 0 

26 


L 

2.7 

SAVE N FOR COUNTING 

27 


BCTR 

7.0 

COMPUTE N - 1 

26 


L 

6.0(6) 

GET INCY AND SEE 

29 


SLA 

6.2 

(SET CONDITION CODES AND MULTIPLY 

30 


BP 

INCXT 

IF ITS CT. 0 

31 


LR 

9.6 

CET INCY • 4 AND 

32 


MR 

8,7 

COMPUTE INCY* (N- 1 ) • 4 AND 

33 


SR 

S . 9 

FIX BASE ADDRESS OF Y< ) 

34 

INCXT 

L 

4.0(4) 

GET INCX AND SEE 

33 


SLA 

4.2 

(SET CONDITION CODES AND MULTIPLY 

36 


BP 

LOOP 

IF ITS .GT. 0 

37 


LR 

i. 4 

CET INCX • 4 AND 

36 


MR 

6.7 

COMPUTE I NCX* ( N - 1 ) • 4 AND 

39 


SR 

3.9 

FIX BASE ADDRESS OF X< ) 

40 

LOOP 

LF. 

2. 0(0.3) 

GET A ( I ) AND 

4 1 


ME 

2. 0(9. 5) 

MULTIPLY BY B( I) AND 

42 


ADR 

0.2 

ACCUMULATE INNER PRODUC! 

43 


AR 

3.4 

THEN FIX THE 

44 


AR 

5,6 

ADDRESSES AND CHECK FOR 

45 


BCT 

2 .LOOP 

END OF THE LOOP 

46 

DONE 

RETURN ( 1 . 9 ) ,T 

CO BACK TO THF. CALLING PROCRAM 

47 


END 




Figure B-3. IBM 360/75 Assembly Code for DSDOT 
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